Inferring latent attributes of Twitter users with label regularization

نویسندگان

  • Ehsan Mohammady Ardehaly
  • Aron Culotta
چکیده

Inferring latent attributes of online users has many applications in public health, politics, and marketing. Most existing approaches rely on supervised learning algorithms, which require manual data annotation and therefore are costly to develop and adapt over time. In this paper, we propose a lightly supervised approach based on label regularization to infer the age, ethnicity, and political orientation of Twitter users. Our approach learns from a heterogeneous collection of soft constraints derived from Census demographics, trends in baby names, and Twitter accounts that are emblematic of class labels. To counteract the imprecision of such constraints, we compare several constraint selection algorithms that optimize classification accuracy on a tuning set. We find that using no user-annotated data, our approach is within 2% of a fully supervised baseline for three of four tasks. Using a small set of labeled data for tuning further improves accuracy on all tasks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Homophily and Latent Attribute Inference: Inferring Latent Attributes of Twitter Users from Neighbors

In this paper, we extend existing work on latent attribute inference by leveraging the principle of homophily: we evaluate the inference accuracy gained by augmenting the user features with features derived from the Twitter profiles and postings of her friends. We consider three attributes which have varying degrees of assortativity: gender, age, and political affiliation. Our approach yields a...

متن کامل

Inferring User Preferences by Probabilistic Logical Reasoning over Social Networks

We propose a framework for inferring the latent attitudes or preferences of users by performing probabilistic first-order logical reasoning over the social network graph. Our method answers questions about Twitter users like Does this user like sushi? or Is this user a New York Knicks fan? by building a probabilistic model that reasons over user attributes (the user’s location or gender) and th...

متن کامل

Classifying Political Orientation on Twitter: It's Not Easy!

Numerous papers have reported great success at inferring the political orientation of Twitter users. This paper has some unfortunate news to deliver: while past work has been sound and often methodologically novel, we have discovered that reported accuracies have been systemically overoptimistic due to the way in which validation datasets have been collected, reporting accuracy levels nearly 30...

متن کامل

Using Word and Phrase Abbreviation Patterns to Extract Age From Twitter Microtexts

The wealth of texts available publicly online for analysis is ever increasing. Much work in computational linguistics focuses on syntactic, contextual, morphological and phonetic analysis on written documents, vocal recordings, or texts on the internet. Twitter messages present a unique challenge for computational linguistic analysis due to their constrained size. The constraint of 140 characte...

متن کامل

Predicting Latent User Attributes on Twitter

Online social networks contain a wealth of information about users that can be harnessed to provide users with personalized content. While most websites now personalize content, it is still not uncommon to have them characterize us incorrectly. We seek to extend previous work that focuses on better predicting latent user attributes for users in the Twitter social network with a flexible graphic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015